Processing Inequality Queries
نویسندگان
چکیده
Bernstein and Goodman showed that natural inequality ( NI) queries can be processed efficiently by semijoins, if there are no multiple inequality join edges, nor cycles with one or zero doublet. In this paper procedures to hand1 e these cases efficiently are given. Multiple inequality join edges can be processed by multi-attribute inequality semijoins. Two procedures based on generalized semi-joins for cyclic NI queries (with one or zero doublet) are developed. Semi-join is a useful operation to reduce the processing cost in distributed databases and database machines [BERNCBlOl] [BERNG81111. Its processing power, however, is limited because not all queries can be solved using semi-joins only. When queries consisting of natural joins of relations (called NJ (Natural Join) queries) are considered, queries in the class called tree queries can be solved using semi-joins only but the rest of queries (called cyclic queries) cannot [BERNG81111. We have introduced generalized semi-joins and developed procedures for cyclic queries using generalized semi-joins [KAMBY8206]. We have also developed several methods to transform cyclic queries into tree ones utilizing data dependencies IKAMBY83051. Using these processing methods, cyclic queries also can be solved Permission to copy without fee all or part of this moteriol i( granted provided that the copies ore not made or distributed for direct commerdal advantage, tht VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the Tenth International Conference on Very Large Data Bases. BASED ON GENERALIZED SEMI-JOINS
منابع مشابه
Exploiting Visual Perception for Sampling-Based Approximation on Aggregate Queries
Efficient sampling algorithms have been developed for approximating answers to aggregate queries on large data sets. In some formulations of the problem, concentration inequalities (such as Hoeffding’s inequality) are used to estimate the confidence interval for an approximated aggregated value. Samples are usually chosen until the confidence interval is arbitrarily small enough regardless of h...
متن کاملQuery Processing for Distance Metrics
In applications such as vision and molecular biology, a common problem is to find the similar objects to a given target (according to some distance measure) in a large database. This paper presents a scheme for query processing in such situations. The basic strategy is to (partially) precompute inter-object distances, and by using the distance information and the triangle inequality, we elimina...
متن کاملارائه روشی پویا جهت پاسخ به پرسوجوهای پیوسته تجمّعی اقتضایی
Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...
متن کاملImprovement of the Analytical Queries Response Time in Real-Time Data Warehouse using Materialized Views Concatenation
A real-time data warehouse is a collection of recent and hierarchical data that is used for managers’ decision-making by creating online analytical queries. The volume of data collected from data sources and entered into the real-time data warehouse is constantly increasing. Moreover, as the volume of input data to the real time data warehouse increases, the interference between online loading ...
متن کاملUsing Triangle Inequality to Efficiently Process Continuous Queries on High-Dimensional Streaming Time Series
In many applications, it is important to quickly find, from a database of patterns, the nearest neighbors of highdimensional query points that come into the system in a streaming form. Treating each query point as a separate one is inefficient. Consecutive query points are often neighbors in the high-dimensional space, and intermediate results in the processing of one query should help the proc...
متن کامل